Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells433449
Missing cells (%)8.1%8.4%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh correlation
Age has 82 (18.4%) missing values Age has 94 (21.1%) missing values Missing
Cabin has 350 (78.5%) missing values Cabin has 354 (79.4%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 300 (67.3%) zeros SibSp has 302 (67.7%) zeros Zeros
Parch has 341 (76.5%) zeros Parch has 334 (74.9%) zeros Zeros
Fare has 10 (2.2%) zeros Fare has 9 (2.0%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2025-03-18 21:18:17.8845312025-03-18 21:18:19.968246
Analysis finished2025-03-18 21:18:19.9654252025-03-18 21:18:21.993726
Duration2.08 seconds2.03 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean453.23318443.73094
 Dataset ADataset B
Minimum11
Maximum890891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:22.091040image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile45.7536.25
Q1240.5214.5
median447.5453.5
Q3674.5663.5
95-th percentile846.25843.25
Maximum890891
Range889890
Interquartile range (IQR)434449

Descriptive statistics

 Dataset ADataset B
Standard deviation256.61237259.5166
Coefficient of variation (CV)0.566181780.58485126
Kurtosis-1.1829226-1.196814
Mean453.23318443.73094
Median Absolute Deviation (MAD)219218.5
Skewness-0.036868491-0.037108928
Sum202142197904
Variance65849.9167348.867
MonotonicityNot monotonicNot monotonic
2025-03-18T21:18:22.228107image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
572 1
 
0.2%
624 1
 
0.2%
886 1
 
0.2%
379 1
 
0.2%
853 1
 
0.2%
127 1
 
0.2%
319 1
 
0.2%
733 1
 
0.2%
517 1
 
0.2%
235 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
220 1
 
0.2%
173 1
 
0.2%
27 1
 
0.2%
756 1
 
0.2%
707 1
 
0.2%
592 1
 
0.2%
798 1
 
0.2%
148 1
 
0.2%
424 1
 
0.2%
518 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
17 1
0.2%
19 1
0.2%
21 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
16 1
0.2%
18 1
0.2%
19 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
16 1
0.2%
18 1
0.2%
19 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
17 1
0.2%
19 1
0.2%
21 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
274 
1
172 
0
275 
1
171 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row00
3rd row01
4th row01
5th row01

Common Values

ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Length

2025-03-18T21:18:22.323762image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T21:18:22.369271image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:22.401954image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring characters

ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
244 
1
108 
2
94 
3
253 
1
98 
2
95 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row33
3rd row32
4th row32
5th row31

Common Values

ValueCountFrequency (%)
3 244
54.7%
1 108
24.2%
2 94
 
21.1%
ValueCountFrequency (%)
3 253
56.7%
1 98
 
22.0%
2 95
 
21.3%

Length

2025-03-18T21:18:22.454723image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T21:18:22.500905image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:22.540293image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 244
54.7%
1 108
24.2%
2 94
 
21.1%
ValueCountFrequency (%)
3 253
56.7%
1 98
 
22.0%
2 95
 
21.3%

Most occurring characters

ValueCountFrequency (%)
3 244
54.7%
1 108
24.2%
2 94
 
21.1%
ValueCountFrequency (%)
3 253
56.7%
1 98
 
22.0%
2 95
 
21.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 244
54.7%
1 108
24.2%
2 94
 
21.1%
ValueCountFrequency (%)
3 253
56.7%
1 98
 
22.0%
2 95
 
21.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 244
54.7%
1 108
24.2%
2 94
 
21.1%
ValueCountFrequency (%)
3 253
56.7%
1 98
 
22.0%
2 95
 
21.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 244
54.7%
1 108
24.2%
2 94
 
21.1%
ValueCountFrequency (%)
3 253
56.7%
1 98
 
22.0%
2 95
 
21.3%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:22.852340image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8267
Median length4849
Mean length27.34529126.847534
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1219611974
Distinct characters6060
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowHansen, Mr. Henry DamsgaardJohnson, Miss. Eleanor Ileen
2nd rowRice, Mrs. William (Margaret Norton)Emir, Mr. Farred Chehab
3rd rowBetros, Mr. TannousHamalainen, Master. Viljo
4th rowBoulos, Miss. NourelainKelly, Mrs. Florence "Fannie"
5th rowMcMahon, Mr. MartinStephenson, Mrs. Walter Bertram (Martha Eustis)
ValueCountFrequency (%)
mr 251
 
13.6%
miss 91
 
4.9%
mrs 73
 
4.0%
william 37
 
2.0%
john 23
 
1.2%
henry 21
 
1.1%
master 19
 
1.0%
mary 13
 
0.7%
george 12
 
0.6%
james 11
 
0.6%
Other values (899) 1296
70.2%
ValueCountFrequency (%)
mr 260
 
14.4%
miss 94
 
5.2%
mrs 67
 
3.7%
william 23
 
1.3%
john 23
 
1.3%
master 18
 
1.0%
henry 17
 
0.9%
george 14
 
0.8%
charles 12
 
0.7%
james 12
 
0.7%
Other values (876) 1267
70.1%
2025-03-18T21:18:23.326263image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1401
 
11.5%
r 992
 
8.1%
e 873
 
7.2%
a 834
 
6.8%
i 683
 
5.6%
n 642
 
5.3%
s 637
 
5.2%
l 583
 
4.8%
M 572
 
4.7%
o 532
 
4.4%
Other values (50) 4447
36.5%
ValueCountFrequency (%)
1363
 
11.4%
r 988
 
8.3%
e 856
 
7.1%
a 844
 
7.0%
n 642
 
5.4%
s 632
 
5.3%
i 629
 
5.3%
M 576
 
4.8%
l 524
 
4.4%
o 490
 
4.1%
Other values (50) 4430
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12196
100.0%
ValueCountFrequency (%)
(unknown) 11974
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1401
 
11.5%
r 992
 
8.1%
e 873
 
7.2%
a 834
 
6.8%
i 683
 
5.6%
n 642
 
5.3%
s 637
 
5.2%
l 583
 
4.8%
M 572
 
4.7%
o 532
 
4.4%
Other values (50) 4447
36.5%
ValueCountFrequency (%)
1363
 
11.4%
r 988
 
8.3%
e 856
 
7.1%
a 844
 
7.0%
n 642
 
5.4%
s 632
 
5.3%
i 629
 
5.3%
M 576
 
4.8%
l 524
 
4.4%
o 490
 
4.1%
Other values (50) 4430
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12196
100.0%
ValueCountFrequency (%)
(unknown) 11974
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1401
 
11.5%
r 992
 
8.1%
e 873
 
7.2%
a 834
 
6.8%
i 683
 
5.6%
n 642
 
5.3%
s 637
 
5.2%
l 583
 
4.8%
M 572
 
4.7%
o 532
 
4.4%
Other values (50) 4447
36.5%
ValueCountFrequency (%)
1363
 
11.4%
r 988
 
8.3%
e 856
 
7.1%
a 844
 
7.0%
n 642
 
5.4%
s 632
 
5.3%
i 629
 
5.3%
M 576
 
4.8%
l 524
 
4.4%
o 490
 
4.1%
Other values (50) 4430
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12196
100.0%
ValueCountFrequency (%)
(unknown) 11974
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1401
 
11.5%
r 992
 
8.1%
e 873
 
7.2%
a 834
 
6.8%
i 683
 
5.6%
n 642
 
5.3%
s 637
 
5.2%
l 583
 
4.8%
M 572
 
4.7%
o 532
 
4.4%
Other values (50) 4447
36.5%
ValueCountFrequency (%)
1363
 
11.4%
r 988
 
8.3%
e 856
 
7.1%
a 844
 
7.0%
n 642
 
5.4%
s 632
 
5.3%
i 629
 
5.3%
M 576
 
4.8%
l 524
 
4.4%
o 490
 
4.1%
Other values (50) 4430
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
280 
female
166 
male
284 
female
162 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.74439464.7264574
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21162108
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowfemalemale
3rd rowmalemale
4th rowfemalefemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 280
62.8%
female 166
37.2%
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%

Length

2025-03-18T21:18:23.411797image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T21:18:23.461472image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:23.493937image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 280
62.8%
female 166
37.2%
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%

Most occurring characters

ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2116
100.0%
ValueCountFrequency (%)
(unknown) 2108
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2116
100.0%
ValueCountFrequency (%)
(unknown) 2108
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2116
100.0%
ValueCountFrequency (%)
(unknown) 2108
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7279
Distinct (%)19.8%22.4%
Missing8294
Missing (%)18.4%21.1%
Infinite00
Infinite (%)0.0%0.0%
Mean29.87843429.587614
 Dataset ADataset B
Minimum0.830.42
Maximum7474
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:23.700491image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.830.42
5-th percentile44
Q120.87521
median2828.75
Q33838
95-th percentile5658
Maximum7474
Range73.1773.58
Interquartile range (IQR)17.12517

Descriptive statistics

 Dataset ADataset B
Standard deviation14.43343814.619999
Coefficient of variation (CV)0.483072090.49412565
Kurtosis0.159448680.19413525
Mean29.87843429.587614
Median Absolute Deviation (MAD)88.25
Skewness0.392656550.35147572
Sum10875.7510414.84
Variance208.32412213.74437
MonotonicityNot monotonicNot monotonic
2025-03-18T21:18:23.835699image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 18
 
4.0%
18 16
 
3.6%
28 15
 
3.4%
22 15
 
3.4%
35 15
 
3.4%
19 13
 
2.9%
36 12
 
2.7%
30 12
 
2.7%
23 11
 
2.5%
25 10
 
2.2%
Other values (62) 227
50.9%
(Missing) 82
 
18.4%
ValueCountFrequency (%)
24 15
 
3.4%
22 15
 
3.4%
36 14
 
3.1%
30 12
 
2.7%
29 12
 
2.7%
28 12
 
2.7%
31 12
 
2.7%
18 12
 
2.7%
19 12
 
2.7%
33 11
 
2.5%
Other values (69) 225
50.4%
(Missing) 94
21.1%
ValueCountFrequency (%)
0.83 1
 
0.2%
0.92 1
 
0.2%
1 2
 
0.4%
2 6
1.3%
3 6
1.3%
4 4
0.9%
6 2
 
0.4%
7 2
 
0.4%
8 1
 
0.2%
9 5
1.1%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 5
1.1%
2 3
0.7%
3 3
0.7%
4 5
1.1%
5 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 5
1.1%
2 3
0.7%
3 3
0.7%
4 5
1.1%
5 2
 
0.4%
ValueCountFrequency (%)
0.83 1
 
0.2%
0.92 1
 
0.2%
1 2
 
0.4%
2 6
1.3%
3 6
1.3%
4 4
0.9%
6 2
 
0.4%
7 2
 
0.4%
8 1
 
0.2%
9 5
1.1%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.540358740.55829596
 Dataset ADataset B
Minimum00
Maximum88
Zeros300302
Zeros (%)67.3%67.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:23.924685image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile33
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.06925741.1572772
Coefficient of variation (CV)1.97879172.0728741
Kurtosis13.65514815.993364
Mean0.540358740.55829596
Median Absolute Deviation (MAD)00
Skewness3.20156063.5058105
Sum241249
Variance1.14331131.3392906
MonotonicityNot monotonicNot monotonic
2025-03-18T21:18:23.988797image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 300
67.3%
1 104
 
23.3%
2 16
 
3.6%
3 11
 
2.5%
4 9
 
2.0%
5 4
 
0.9%
8 2
 
0.4%
ValueCountFrequency (%)
0 302
67.7%
1 101
 
22.6%
2 15
 
3.4%
3 12
 
2.7%
4 10
 
2.2%
8 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 300
67.3%
1 104
 
23.3%
2 16
 
3.6%
3 11
 
2.5%
4 9
 
2.0%
5 4
 
0.9%
8 2
 
0.4%
ValueCountFrequency (%)
0 302
67.7%
1 101
 
22.6%
2 15
 
3.4%
3 12
 
2.7%
4 10
 
2.2%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 302
67.7%
1 101
 
22.6%
2 15
 
3.4%
3 12
 
2.7%
4 10
 
2.2%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 300
67.3%
1 104
 
23.3%
2 16
 
3.6%
3 11
 
2.5%
4 9
 
2.0%
5 4
 
0.9%
8 2
 
0.4%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct76
Distinct (%)1.6%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.396860990.4058296
 Dataset ADataset B
Minimum00
Maximum65
Zeros341334
Zeros (%)76.5%74.9%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:24.049486image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300.75
95-th percentile22
Maximum65
Range65
Interquartile range (IQR)00.75

Descriptive statistics

 Dataset ADataset B
Standard deviation0.862131670.84739501
Coefficient of variation (CV)2.1723772.0880562
Kurtosis10.4034079.7373855
Mean0.396860990.4058296
Median Absolute Deviation (MAD)00
Skewness2.87918062.7902118
Sum177181
Variance0.743271020.7180783
MonotonicityNot monotonicNot monotonic
2025-03-18T21:18:24.113761image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 341
76.5%
1 55
 
12.3%
2 40
 
9.0%
5 3
 
0.7%
3 3
 
0.7%
4 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 334
74.9%
1 63
 
14.1%
2 41
 
9.2%
5 5
 
1.1%
4 2
 
0.4%
3 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 55
 
12.3%
2 40
 
9.0%
3 3
 
0.7%
4 3
 
0.7%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 334
74.9%
1 63
 
14.1%
2 41
 
9.2%
3 1
 
0.2%
4 2
 
0.4%
5 5
 
1.1%
ValueCountFrequency (%)
0 334
74.9%
1 63
 
14.1%
2 41
 
9.2%
3 1
 
0.2%
4 2
 
0.4%
5 5
 
1.1%
ValueCountFrequency (%)
0 341
76.5%
1 55
 
12.3%
2 40
 
9.0%
3 3
 
0.7%
4 3
 
0.7%
5 3
 
0.7%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct376377
Distinct (%)84.3%84.5%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:24.502847image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length617
Mean length6.6614356.7488789
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29713010
Distinct characters3235
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique325325 ?
Unique (%)72.9%72.9%

Sample

 Dataset ADataset B
1st row350029347742
2nd row3826522631
3rd row2648250649
4th row2678223596
5th row37037236947
ValueCountFrequency (%)
pc 30
 
5.4%
c.a 12
 
2.2%
ca 8
 
1.4%
a/5 7
 
1.3%
2144 5
 
0.9%
c 5
 
0.9%
ston/o 5
 
0.9%
2 5
 
0.9%
1601 5
 
0.9%
sc/paris 5
 
0.9%
Other values (392) 468
84.3%
ValueCountFrequency (%)
pc 27
 
4.8%
c.a 14
 
2.5%
a/5 8
 
1.4%
ca 7
 
1.2%
ston/o 7
 
1.2%
2 7
 
1.2%
a/4 6
 
1.1%
3101295 5
 
0.9%
sc/paris 5
 
0.9%
w./c 5
 
0.9%
Other values (398) 477
84.0%
2025-03-18T21:18:24.993302image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 374
12.6%
1 346
11.6%
2 291
9.8%
7 248
8.3%
4 238
8.0%
6 213
 
7.2%
0 209
 
7.0%
9 186
 
6.3%
5 180
 
6.1%
8 141
 
4.7%
Other values (22) 545
18.3%
ValueCountFrequency (%)
3 360
12.0%
1 320
10.6%
2 285
9.5%
4 253
8.4%
7 245
 
8.1%
6 217
 
7.2%
0 205
 
6.8%
5 199
 
6.6%
9 181
 
6.0%
8 130
 
4.3%
Other values (25) 615
20.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2971
100.0%
ValueCountFrequency (%)
(unknown) 3010
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 374
12.6%
1 346
11.6%
2 291
9.8%
7 248
8.3%
4 238
8.0%
6 213
 
7.2%
0 209
 
7.0%
9 186
 
6.3%
5 180
 
6.1%
8 141
 
4.7%
Other values (22) 545
18.3%
ValueCountFrequency (%)
3 360
12.0%
1 320
10.6%
2 285
9.5%
4 253
8.4%
7 245
 
8.1%
6 217
 
7.2%
0 205
 
6.8%
5 199
 
6.6%
9 181
 
6.0%
8 130
 
4.3%
Other values (25) 615
20.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2971
100.0%
ValueCountFrequency (%)
(unknown) 3010
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 374
12.6%
1 346
11.6%
2 291
9.8%
7 248
8.3%
4 238
8.0%
6 213
 
7.2%
0 209
 
7.0%
9 186
 
6.3%
5 180
 
6.1%
8 141
 
4.7%
Other values (22) 545
18.3%
ValueCountFrequency (%)
3 360
12.0%
1 320
10.6%
2 285
9.5%
4 253
8.4%
7 245
 
8.1%
6 217
 
7.2%
0 205
 
6.8%
5 199
 
6.6%
9 181
 
6.0%
8 130
 
4.3%
Other values (25) 615
20.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2971
100.0%
ValueCountFrequency (%)
(unknown) 3010
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 374
12.6%
1 346
11.6%
2 291
9.8%
7 248
8.3%
4 238
8.0%
6 213
 
7.2%
0 209
 
7.0%
9 186
 
6.3%
5 180
 
6.1%
8 141
 
4.7%
Other values (22) 545
18.3%
ValueCountFrequency (%)
3 360
12.0%
1 320
10.6%
2 285
9.5%
4 253
8.4%
7 245
 
8.1%
6 217
 
7.2%
0 205
 
6.8%
5 199
 
6.6%
9 181
 
6.0%
8 130
 
4.3%
Other values (25) 615
20.4%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct169180
Distinct (%)37.9%40.4%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.77901633.132539
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros109
Zeros (%)2.2%2.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:25.115484image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.068757.225
Q17.89587.9031
median14.454214.5
Q330.9239529.925
95-th percentile112.67708118.31875
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.0281522.0219

Descriptive statistics

 Dataset ADataset B
Standard deviation50.31041654.521731
Coefficient of variation (CV)1.58313321.6455645
Kurtosis40.12961130.294074
Mean31.77901633.132539
Median Absolute Deviation (MAD)7.08136.8125
Skewness5.28755884.7158345
Sum14173.44114777.113
Variance2531.13792972.6192
MonotonicityNot monotonicNot monotonic
2025-03-18T21:18:25.252877image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 24
 
5.4%
7.8958 19
 
4.3%
7.75 18
 
4.0%
8.05 16
 
3.6%
10.5 13
 
2.9%
26 13
 
2.9%
7.775 11
 
2.5%
7.925 10
 
2.2%
0 10
 
2.2%
26.55 10
 
2.2%
Other values (159) 302
67.7%
ValueCountFrequency (%)
7.8958 21
 
4.7%
8.05 19
 
4.3%
13 17
 
3.8%
26 16
 
3.6%
7.75 15
 
3.4%
10.5 11
 
2.5%
0 9
 
2.0%
26.55 9
 
2.0%
7.925 9
 
2.0%
8.6625 8
 
1.8%
Other values (170) 312
70.0%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.05 2
 
0.4%
7.125 3
 
0.7%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.05 2
 
0.4%
7.125 3
 
0.7%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct7776
Distinct (%)80.2%82.6%
Missing350354
Missing (%)78.5%79.4%
Memory size7.0 KiB7.0 KiB
2025-03-18T21:18:25.597762image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1111
Median length33
Mean length3.656253.6630435
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters351337
Distinct characters1918
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6064 ?
Unique (%)62.5%69.6%

Sample

 Dataset ADataset B
1st rowC7D20
2nd rowF33F33
3rd rowC126B73
4th rowB49A7
5th rowC78B5
ValueCountFrequency (%)
b96 3
 
2.7%
b98 3
 
2.7%
c23 3
 
2.7%
c25 3
 
2.7%
c27 3
 
2.7%
b49 2
 
1.8%
c78 2
 
1.8%
c126 2
 
1.8%
c22 2
 
1.8%
c26 2
 
1.8%
Other values (76) 88
77.9%
ValueCountFrequency (%)
c23 4
 
3.6%
c25 4
 
3.6%
c27 4
 
3.6%
f33 3
 
2.7%
g6 3
 
2.7%
b35 2
 
1.8%
c83 2
 
1.8%
b58 2
 
1.8%
b60 2
 
1.8%
c52 2
 
1.8%
Other values (75) 82
74.5%
2025-03-18T21:18:26.002659image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 40
11.4%
3 37
10.5%
2 32
 
9.1%
B 29
 
8.3%
1 26
 
7.4%
6 24
 
6.8%
8 21
 
6.0%
9 20
 
5.7%
17
 
4.8%
0 16
 
4.6%
Other values (9) 89
25.4%
ValueCountFrequency (%)
2 36
10.7%
C 35
10.4%
B 33
9.8%
3 31
 
9.2%
5 26
 
7.7%
1 21
 
6.2%
6 20
 
5.9%
7 18
 
5.3%
0 18
 
5.3%
18
 
5.3%
Other values (8) 81
24.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 351
100.0%
ValueCountFrequency (%)
(unknown) 337
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 40
11.4%
3 37
10.5%
2 32
 
9.1%
B 29
 
8.3%
1 26
 
7.4%
6 24
 
6.8%
8 21
 
6.0%
9 20
 
5.7%
17
 
4.8%
0 16
 
4.6%
Other values (9) 89
25.4%
ValueCountFrequency (%)
2 36
10.7%
C 35
10.4%
B 33
9.8%
3 31
 
9.2%
5 26
 
7.7%
1 21
 
6.2%
6 20
 
5.9%
7 18
 
5.3%
0 18
 
5.3%
18
 
5.3%
Other values (8) 81
24.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 351
100.0%
ValueCountFrequency (%)
(unknown) 337
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 40
11.4%
3 37
10.5%
2 32
 
9.1%
B 29
 
8.3%
1 26
 
7.4%
6 24
 
6.8%
8 21
 
6.0%
9 20
 
5.7%
17
 
4.8%
0 16
 
4.6%
Other values (9) 89
25.4%
ValueCountFrequency (%)
2 36
10.7%
C 35
10.4%
B 33
9.8%
3 31
 
9.2%
5 26
 
7.7%
1 21
 
6.2%
6 20
 
5.9%
7 18
 
5.3%
0 18
 
5.3%
18
 
5.3%
Other values (8) 81
24.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 351
100.0%
ValueCountFrequency (%)
(unknown) 337
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 40
11.4%
3 37
10.5%
2 32
 
9.1%
B 29
 
8.3%
1 26
 
7.4%
6 24
 
6.8%
8 21
 
6.0%
9 20
 
5.7%
17
 
4.8%
0 16
 
4.6%
Other values (9) 89
25.4%
ValueCountFrequency (%)
2 36
10.7%
C 35
10.4%
B 33
9.8%
3 31
 
9.2%
5 26
 
7.7%
1 21
 
6.2%
6 20
 
5.9%
7 18
 
5.3%
0 18
 
5.3%
18
 
5.3%
Other values (8) 81
24.0%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
323 
C
82 
Q
40 
S
330 
C
79 
Q
36 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowQC
3rd rowCS
4th rowCS
5th rowQC

Common Values

ValueCountFrequency (%)
S 323
72.4%
C 82
 
18.4%
Q 40
 
9.0%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 330
74.0%
C 79
 
17.7%
Q 36
 
8.1%
(Missing) 1
 
0.2%

Length

2025-03-18T21:18:26.083002image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T21:18:26.129559image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:26.170670image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 323
72.6%
c 82
 
18.4%
q 40
 
9.0%
ValueCountFrequency (%)
s 330
74.2%
c 79
 
17.8%
q 36
 
8.1%

Most occurring characters

ValueCountFrequency (%)
S 323
72.6%
C 82
 
18.4%
Q 40
 
9.0%
ValueCountFrequency (%)
S 330
74.2%
C 79
 
17.8%
Q 36
 
8.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 323
72.6%
C 82
 
18.4%
Q 40
 
9.0%
ValueCountFrequency (%)
S 330
74.2%
C 79
 
17.8%
Q 36
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 323
72.6%
C 82
 
18.4%
Q 40
 
9.0%
ValueCountFrequency (%)
S 330
74.2%
C 79
 
17.8%
Q 36
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 323
72.6%
C 82
 
18.4%
Q 40
 
9.0%
ValueCountFrequency (%)
S 330
74.2%
C 79
 
17.8%
Q 36
 
8.1%

Interactions

Dataset A

2025-03-18T21:18:19.431228image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.472336image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.137303image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.184667image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.431546image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.469499image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.734470image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.766115image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.047970image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.076345image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.487168image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.527296image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.195932image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.238416image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.490535image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.527644image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.796232image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.828163image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.190785image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.130752image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.547275image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.586277image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.255409image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.296392image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.552661image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.588720image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.855922image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.886163image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.251488image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.189765image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.608984image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.647063image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.317139image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.357551image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.611722image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.646605image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.921997image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.952929image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.314362image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.251133image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.668380image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.704147image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.374976image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.412608image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.673724image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:20.707431image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:18.984635image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.012813image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T21:18:19.372311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:21.306213image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-03-18T21:18:26.217759image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T21:18:26.316342image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0710.115-0.2620.0310.2680.080-0.1850.102
Embarked0.0711.0000.1930.0380.0000.2530.0540.1110.039
Fare0.1150.1931.0000.393-0.0350.4690.1470.4730.338
Parch-0.2620.0380.3931.0000.0020.0000.2970.4220.154
PassengerId0.0310.000-0.0350.0021.0000.0000.083-0.1060.000
Pclass0.2680.2530.4690.0000.0001.0000.0880.1370.384
Sex0.0800.0540.1470.2970.0830.0881.0000.2400.527
SibSp-0.1850.1110.4730.422-0.1060.1370.2401.0000.203
Survived0.1020.0390.3380.1540.0000.3840.5270.2031.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0890.055-0.2400.0180.1920.131-0.2400.124
Embarked0.0891.0000.2150.0000.0000.2320.0240.0240.154
Fare0.0550.2151.0000.4310.0060.4680.2160.4260.256
Parch-0.2400.0000.4311.000-0.0260.0330.2730.4330.159
PassengerId0.0180.0000.006-0.0261.0000.0000.000-0.0730.112
Pclass0.1920.2320.4680.0330.0001.0000.1620.1120.391
Sex0.1310.0240.2160.2730.0000.1621.0000.1830.587
SibSp-0.2400.0240.4260.433-0.0730.1120.1831.0000.140
Survived0.1240.1540.2560.1590.1120.3910.5870.1401.000

Missing values

Dataset A

2025-03-18T21:18:19.762057image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-18T21:18:21.796392image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-18T21:18:19.840729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-18T21:18:21.873808image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-18T21:18:19.924508image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-18T21:18:21.953579image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
62362403Hansen, Mr. Henry Damsgaardmale21.0003500297.8542NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
37837903Betros, Mr. Tannousmale20.00026484.0125NaNC
85285303Boulos, Miss. Nourelainfemale9.011267815.2458NaNC
12612703McMahon, Mr. MartinmaleNaN003703727.7500NaNQ
31831911Wick, Miss. Mary Nataliefemale31.00236928164.8667C7S
73273302Knight, Mr. Robert JmaleNaN002398550.0000NaNS
51651712Lemore, Mrs. (Amelia Milley)female34.000C.A. 3426010.5000F33S
23423502Leyson, Mr. Robert William Normanmale24.000C.A. 2956610.5000NaNS
858613Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)female33.030310127815.8500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
17217313Johnson, Miss. Eleanor Ileenfemale1.001134774211.1333NaNS
262703Emir, Mr. Farred ChehabmaleNaN0026317.2250NaNC
75575612Hamalainen, Master. Viljomale0.671125064914.5000NaNS
70670712Kelly, Mrs. Florence "Fannie"female45.000022359613.5000NaNS
59159211Stephenson, Mrs. Walter Bertram (Martha Eustis)female52.00103694778.2667D20C
79779813Osman, Mrs. Marafemale31.00003492448.6833NaNS
14714803Ford, Miss. Robina Maggie "Ruby"female9.0022W./C. 660834.3750NaNS
42342403Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)female28.001134708014.4000NaNS
51751803Ryan, Mr. PatrickmaleNaN0037111024.1500NaNQ
79179202Gaskell, Mr. Alfredmale16.000023986526.0000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
56456503Meanwell, Miss. (Marion Ogden)femaleNaN00SOTON/O.Q. 3920878.0500NaNS
36036103Skoog, Mr. Wilhelmmale40.01434708827.9000NaNS
62062103Yasbeck, Mr. Antonimale27.010265914.4542NaNC
11311403Jussila, Miss. Katriinafemale20.01041369.8250NaNS
35435503Yousif, Mr. WazlimaleNaN0026477.2250NaNC
26927011Bissette, Miss. Ameliafemale35.000PC 17760135.6333C99S
34734813Davison, Mrs. Thomas Henry (Mary E Finck)femaleNaN1038652516.1000NaNS
56556603Davies, Mr. Alfred Jmale24.020A/4 4887124.1500NaNS
34935003Dimic, Mr. Jovanmale42.0003150888.6625NaNS
57157211Appleton, Mrs. Edward Dale (Charlotte Lamson)female53.0201176951.4792C101S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
67667703Sawyer, Mr. Frederick Charlesmale24.5003428268.0500NaNS
70070111Astor, Mrs. John Jacob (Madeleine Talmadge Force)female18.010PC 17757227.5250C62 C64C
23723812Collyer, Miss. Marjorie "Lottie"female8.002C.A. 3192126.2500NaNS
85685711Wick, Mrs. George Dennick (Mary Hitchcock)female45.01136928164.8667NaNS
394013Nicola-Yarred, Miss. Jamilafemale14.010265111.2417NaNC
54654712Beane, Mrs. Edward (Ethel Clarke)female19.010290826.0000NaNS
63263311Stahelin-Maeglin, Dr. Maxmale32.0001321430.5000B50C
47247312West, Mrs. Edwy Arthur (Ada Mary Worth)female33.012C.A. 3465127.7500NaNS
62462503Bowen, Mr. David John "Dai"male21.0005463616.1000NaNS
21922002Harris, Mr. Waltermale30.000W/C 1420810.5000NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.